Survival Analysis

Survival data characteristics

Survival function

= the probability of survival past any time t:

S(t)=Pr(T>t)

Parametric Survival models

Non-Parametric Models

Kaplan Meier Model

= the probability of survival past t months with censored observations:

S(t)=1Pr(T=i|T>=i)=tit(1dini)

where

Important

Note that Kaplan-Meier estimator estimates the survival function directly from observed data, making no assumptions about the underlying hazard function.

Hazard Function

= the instantaneous rate of death at time t, given that the subject has survived up to that time
= the chances of dying in a small interval of time:

h(t)=limΔt0P(tT<t+Δt|Tt)Δt H (t) = \int _0 { #t} h(t) dt S(t) = exp(- \int _0 { #t} h(t) dt)

Cox (Proportional Hazards) Model

= a regression model for survival data that allows us to assess the effect of covariates on survival time while making minimal assumptions about the shape of the hazard function:

h(t|X)=h0(t)exp(β1X1+β2X2+...+βpXp)

where

Inspiring

In other words, it is similar to Multiple Regression Analysis, but the difference is that the depended variable is the Hazard Function at a given time t.

When facing many features, we should consider Penalized Cox Models, and the penalization/regularization techniques are similar to linear regression:

Tree-structured Survival Models

Tree-structured

  • assumptions
    • Cox model assumes proportional hazards, meaning the effect of covariates is constant over time.
    • survival trees does not rely on the proportional hazards assumption
  • non-linear relationship and high-dimensional data
    • Cox model cannot handle non-linear relationship and can struggle with high-dimensional data
    • survival trees can handle both

Survival trees

The single survival tree prediction for an individual is a cumulative hazard function (CHF) computed for all individuals in the same tree terminal node:

Hh(t)=tl,htdl,hRl,h

where h is terminal node, t is event time, d is the number of events at time t, and R is the number of individuals at risk at time t.

Survival random forest

With the CHF for each tree defined above, the entire forest the CHF averaged over all trees:

H(t|x) = \frac{1}{N} \sum _{i=1} { #N} H_i(t|x)

where Hi is the estimated CHF for the individual x's terminal node in the i-th of the N trees.

Pasted image 20240417124514.png|400
image source

Deep Learning Survival Models

#TODO

Example of usage

Given such a dataset:

Subject Time (months) Event Age (years) Gender Treatment
1 10 1 55 M Drug A
2 15 1 62 F Drug B
3 20 0 48 M Drug A
4 25 1 70 F Drug A
5 30 0 58 M Drug B

Model evaluation

Graphical Evaluation

Measure the discrimination ability of survival models

Concordance Index (C-index, or Harrell's C-index)

= quantifies the model's ability to correctly rank the relative risks or predicted survival probabilities of pairs of subjects:

#concordant pairs+0.5×#risk ties#permissible pairs

Time-dependent Area Under the Curve (AUC)

= the AUC for different time points to assess the model's predictive accuracy over time